-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[mlir] Return vectorized values instead of replacing #144158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-linalg Author: None (Max191) ChangesExposes a new Full diff: https://github.com/llvm/llvm-project/pull/144158.diff 2 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 2eef0a06d0eb4..de09dae24eccf 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -772,17 +772,26 @@ LogicalResult deallocateGPUPrivateMemory(OpBuilder &, Value /*buffer*/);
bool hasVectorizationImpl(Operation *);
/// Emit a suitable vector form for an operation. If provided,
-/// `inputVectorSizes` are used to vectorize this operation. `inputVectorSizes`
-/// must match the rank of the iteration space of the operation and the sizes
-/// must be smaller or equal than their counterpart interation space sizes, if
-/// static. `inputVectorShapes` also allows the vectorization of operations with
-/// dynamic shapes.
+/// `inputVectorSizes` are used to vectorize this operation.
+/// `inputVectorSizes` must match the rank of the iteration space of the
+/// operation and the input vector sizes must be greater than or equal to
+/// their counterpart iteration space sizes, if static. `inputVectorShapes`
+/// also allows the vectorization of operations with dynamic shapes.
LogicalResult vectorize(RewriterBase &rewriter, Operation *op,
ArrayRef<int64_t> inputVectorSizes = {},
ArrayRef<bool> inputScalableVecDims = {},
bool vectorizeNDExtract = false,
bool flatten1DDepthwiseConv = false);
+/// Vectorize and store new vectorized results in `newResuls`, without replacing
+/// the old `op`.
+LogicalResult vectorize(RewriterBase &rewriter, Operation *op,
+ SmallVector<Value> &newResults,
+ ArrayRef<int64_t> inputVectorSizes = {},
+ ArrayRef<bool> inputScalableVecDims = {},
+ bool vectorizeNDExtract = false,
+ bool flatten1DDepthwiseConv = false);
+
/// Emit a suitable vector form for a Copy op with fully static shape.
LogicalResult vectorizeCopy(RewriterBase &builder, memref::CopyOp copyOp);
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index ff28bd7c48342..3efef3af93fa3 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -2522,13 +2522,8 @@ bool mlir::linalg::hasVectorizationImpl(Operation *op) {
tensor::InsertSliceOp>(op);
}
-/// Emit a suitable vector form for an operation. If provided,
-/// `inputVectorSizes` are used to vectorize this operation.
-/// `inputVectorSizes` must match the rank of the iteration space of the
-/// operation and the input vector sizes must be greater than or equal to
-/// their counterpart iteration space sizes, if static. `inputVectorShapes`
-/// also allows the vectorization of operations with dynamic shapes.
LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
+ SmallVector<Value> &newResults,
ArrayRef<int64_t> inputVectorSizes,
ArrayRef<bool> inputScalableVecDims,
bool vectorizeNDExtract,
@@ -2558,57 +2553,65 @@ LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
}
}
- SmallVector<Value> results;
- auto vectorizeResult =
- TypeSwitch<Operation *, LogicalResult>(op)
- .Case<linalg::LinalgOp>([&](auto linalgOp) {
- // TODO: isaConvolutionOpInterface that can also infer from
- // generic features. Will require stride/dilation attributes
- // inference.
- if (isa<ConvolutionOpInterface>(linalgOp.getOperation())) {
- FailureOr<Operation *> convOr = vectorizeConvolution(
- rewriter, linalgOp, inputVectorSizes, inputScalableVecDims,
- flatten1DDepthwiseConv);
- if (succeeded(convOr)) {
- llvm::append_range(results, (*convOr)->getResults());
- return success();
- }
-
- LDBG("Unsupported convolution can't be vectorized.\n");
- return failure();
- }
-
- LDBG("Vectorize generic by broadcasting to the canonical vector "
- "shape\n");
-
- // Pre-process before proceeding.
- convertAffineApply(rewriter, linalgOp);
-
- // TODO: 'vectorize' takes in a 'RewriterBase' which is up-casted
- // to 'OpBuilder' when it is passed over to some methods like
- // 'vectorizeAsLinalgGeneric'. This is highly problematic: if we
- // erase an op within these methods, the actual rewriter won't be
- // notified and we will end up with read-after-free issues!
- return vectorizeAsLinalgGeneric(rewriter, state, linalgOp, results);
- })
- .Case<tensor::PadOp>([&](auto padOp) {
- return vectorizeAsTensorPadOp(rewriter, padOp, inputVectorSizes,
- results);
- })
- .Case<linalg::PackOp>([&](auto packOp) {
- return vectorizeAsTensorPackOp(rewriter, packOp, inputVectorSizes,
- results);
- })
- .Case<linalg::UnPackOp>([&](auto unpackOp) {
- return vectorizeAsTensorUnpackOp(rewriter, unpackOp,
- inputVectorSizes, results);
- })
- .Case<tensor::InsertSliceOp>([&](auto sliceOp) {
- return vectorizeAsInsertSliceOp(rewriter, sliceOp, inputVectorSizes,
- results);
- })
- .Default([](auto) { return failure(); });
+ return TypeSwitch<Operation *, LogicalResult>(op)
+ .Case<linalg::LinalgOp>([&](auto linalgOp) {
+ // TODO: isaConvolutionOpInterface that can also infer from
+ // generic features. Will require stride/dilation attributes
+ // inference.
+ if (isa<ConvolutionOpInterface>(linalgOp.getOperation())) {
+ FailureOr<Operation *> convOr = vectorizeConvolution(
+ rewriter, linalgOp, inputVectorSizes, inputScalableVecDims,
+ flatten1DDepthwiseConv);
+ if (succeeded(convOr)) {
+ llvm::append_range(newResults, (*convOr)->getResults());
+ return success();
+ }
+
+ LDBG("Unsupported convolution can't be vectorized.\n");
+ return failure();
+ }
+
+ LDBG("Vectorize generic by broadcasting to the canonical vector "
+ "shape\n");
+
+ // Pre-process before proceeding.
+ convertAffineApply(rewriter, linalgOp);
+
+ // TODO: 'vectorize' takes in a 'RewriterBase' which is up-casted
+ // to 'OpBuilder' when it is passed over to some methods like
+ // 'vectorizeAsLinalgGeneric'. This is highly problematic: if we
+ // erase an op within these methods, the actual rewriter won't be
+ // notified and we will end up with read-after-free issues!
+ return vectorizeAsLinalgGeneric(rewriter, state, linalgOp, newResults);
+ })
+ .Case<tensor::PadOp>([&](auto padOp) {
+ return vectorizeAsTensorPadOp(rewriter, padOp, inputVectorSizes,
+ newResults);
+ })
+ .Case<linalg::PackOp>([&](auto packOp) {
+ return vectorizeAsTensorPackOp(rewriter, packOp, inputVectorSizes,
+ newResults);
+ })
+ .Case<linalg::UnPackOp>([&](auto unpackOp) {
+ return vectorizeAsTensorUnpackOp(rewriter, unpackOp, inputVectorSizes,
+ newResults);
+ })
+ .Case<tensor::InsertSliceOp>([&](auto sliceOp) {
+ return vectorizeAsInsertSliceOp(rewriter, sliceOp, inputVectorSizes,
+ newResults);
+ })
+ .Default([](auto) { return failure(); });
+}
+LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
+ ArrayRef<int64_t> inputVectorSizes,
+ ArrayRef<bool> inputScalableVecDims,
+ bool vectorizeNDExtract,
+ bool flatten1DDepthwiseConv) {
+ SmallVector<Value> results;
+ LogicalResult vectorizeResult = mlir::linalg::vectorize(
+ rewriter, op, results, inputVectorSizes, inputScalableVecDims,
+ vectorizeNDExtract, flatten1DDepthwiseConv);
if (failed(vectorizeResult)) {
LDBG("Vectorization failed\n");
return failure();
|
fe43742
to
6d579fb
Compare
Thanks for sending this! What is the rationale and the end-goal here?
Sure, but without tests upstream, this change feels very arbitrary. What is meant to decide whether to use or not the vectorization output? This could be a job for a cost-model? |
I have a use case downstream in IREE, although I haven't sent the code out for it yet. I want to reuse the linalg.generic vectorization implementation for a custom op in IREE, since the vectorization for them is similar. My transformation creates a new generic op with a similar body to the custom op, and then I vectorized the generic, but the results of the generic cannot immediately replace anything, since it is just a step in the vectorization transformation, so I want a way to vectorize and just get the results of the vectorized operation. |
Ideally, I wouldn't need to create a new linalg.generic like this, but the vectorization implementations are not very friendly for doing anything different. Perhaps in the future the vectorization implementations could be refactored, but I don't have the bandwidth to take on that task right now. |
Thanks for elaborating - that helps clarify the intent. That said, I don’t see a clear path to testing this upstream. Without tests, this becomes “dead code” as soon as it lands. And dead code - even if small - is a maintenance burden. In general, untested code should be removed (or not merged in the first place). One might argue it’ll be used in IREE, but MLIR maintenance shouldn't depend on context from downstream projects. I also get the impression there may be an alternative way to approach this - I’ve included some more specific concerns and questions below.
True, the Linalg vectorizer’s structure is a bit convoluted - but at its core, it behaves like a pattern (see VectorizationPattern). So "replacement" semantics still seem appropriate here. Also, since this is a Linalg vectorizer, its purpose is to take a
Got it - but this reads as though you're trying to use the Linalg vectorizer as part of a custom lowering pipeline, with something like:
If that's the case, why can’t step (1) happen in a separate transformation (upstream or downstream)? That would allow the existing vectorizer to be used as intended, replacing
OK, to me this implies that you won't be able to avoid some downstream changes for this to work? Could this change also live downstream?
Totally understandable. But even so, I think we need to at least clarify what the longer-term trajectory would be, so that any short-term workaround doesn’t create architectural friction. Just to reiterate: I’d like to understand the use case better and help you find a path forward. I do appreciate that this is a relatively small change, but it seems to cut against MLIR’s general principles. That said, maybe there are existing examples or infra for invoking patterns without replacement? If so, happy to explore those. We can also ask on Discourse. -Andrzej (*) It also supports |
The new callback is directly called in the existing linalg::vectorize transform now, so this new function is tested by all of the vectorization tests in MLIR already. Was there some other type of test that you had in mind? |
I suppose it could, although in my use case, it is safer to do this as a single transformation because it ensures that no new tensor ops remain in the IR. One of the requirements of the transformation is that the custom op is bufferized, so I don't want the tensor ops that I inject to remain in the IR after the transformation. If I do the conversion to linalg.generic as a separate transformation, then I will have to rely on separate folders for vector.transfer_read/transfer_write pairs to ensure all tensor ops are folded away after calling linalg::vectorize. |
I believe this is not uncommon here in MLIR. The TilingInterface utilities, for example, don't do any replacement: llvm-project/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h Lines 152 to 156 in 414710c
The callback returns an I think the reasoning for having transformations like this is that it allows the user of the transform to apply additional transformation to the results of the transform before doing the replacement, which is generally better than adding transformation after replacing IMO, since you don't want to replace ops until the transformation is fully complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: @Max191 I think it is better to return std::optional<VectorizationResult>
like other transform methods, but let's see how the discussion goes before you make other changes.
I just came back from vacation and I'm still catching up the threads. If I read it correctly, there are few concerns:
- The expectation of the return values from
vectorize()
method. - Test coverage for the new code.
- Test coverage between upstream and downstream projects.
- The longer-term trajectory for vectorization.
(1) (2) (3) are all important as others could follow bad PRs in the future, and we should have the same review bar as possible as we can. I need to chat with @Max191 about downstream changes, but I can share my thought that is orthogonal to the downstream implementation details.
RE (1): I've been thinking about this when I saw other PRs about refactoring "patterns" and "transforms". TLDR is that I'm +1 on returning transformed IR from methods. The line between patterns and functions to me is that whether it returns the final results or not. People can use functions to build patterns. Either just use a single function, or assemble the pieces from multiple functions. This happens not only in linalg dialect, but also other dialects. As we expose the vectorize
method today, I'd expect it to return std::optional<VectorizeResult>
. We have such structure in other linalg transfroms, e.g., Tiling, Promotion, etc, but not vectorize()
. I think we had the same intention for the vectorization, but it got lost at a moment. Thus, I took a stab at digging into the history.
When we lived in the very old world, we developed most transformations using pattern-rewriting framework with linalg transform filters, see RFC for the retirement plan, and it includes tiling, promotion, vectorization, etc. We've spent a lot of time on moving core functions as C++ functions and reuse them everywhere, including transform ops. Tiling is a good example to follow. Here is the oldest commit that shows the change. It returned void
in the first place as it was the first step.
At a moment, we had std::optional<*Result>
style like other TilingResult, TileAndFuseResult, etc.
/// Emit a suitable vector form for a Linalg op with fully static shape.
struct VectorizedLinalgOp {
SmallVector<Value> tensorResults;
VectorizedLinalgOp &operator=(const VectorizedLinalgOp &) = default;
};
Optional<VectorizedLinalgOp> vectorizeLinalgOp(OpBuilder &builder,
Operation *op);
Later, a contributor refactored the code the handling of the result to a referenced function argument for some reasons: c1a4cd5 and it becomes
LogicalResult vectorizeLinalgOp(OpBuilder &builder, Operation *op,
SmallVectorImpl<Value> &newResults);
We still had tracked the return values until another refactoring. As you see in the change, all other transformations return transformed op except the vectorize()
method. I don't find any comments from the review, and it may because we wanted to implement a better vectorizer so we're being conservative. Or maybe it is just because it looked straight-forward in the change itself. E.g., it's easier because you don't need to introduce a new struct, and it'd be very similar to the deprecated one. I don't know.
However, if we get back to what is the line between patterns and functions, I think we should return transformed IRs from the functions; it is pattern's responsibility to handle the return value and do the further transformation within their scope.
Note: other transform like generalizeNamedOp performs the replacement and returns the new ops. I think the vectorize method can follow the same API.
RE (2): If I read code correctly, the current implementation does not introduce dead code because they all go through the old vectorize
method and it calls the new vectorize
method. It is not obvious IMO. What I'd argue is should we return std::optional<SmallVector<Value>>
or std::optional<VectorizationResult>
instead? It'd break downstream users, but I think it is a reasonable move. What do you think?
RE (3): I think all the code is well-tested in the upstream, and there are no dead codes atm. Like what I mentioned above, I'd suggest returning std::optional<VectorizationResult>
which makes it clearer. One of the things I like is that the upstream has good test coverage for vectorization, so I only need few tests in the IREE project. E.g., make sure that vector size inference is working when masking is enabled in IREE. The vector size inference could be IREE specific, so it's not upstreamed. How I form it is like the core functionality is tested in upstream, and IREE just has a slightly higher level of tests within the pass scope.
RE (4):
I'm happy to see the discussion and I'm happy to provide feedbacks. There was a discussion long time ago, but we did not make progress so far. I hope that the above explanation could address your concern and this is not becoming a blocker.
I've been wanting to investigate how to move the vectorization to the next chapter, as we may have more future changes on linalg ops and people'd like to extend the existing vectorizer. My rough idea is that we need an interface; any operation implements the interface can be vectorized by our new vectorizer system using the interface. Roughly, we'd need three interface methods. One for reading data from tensor/memref to vectors; it returns the loaded vectors. Another method is for writing vectors into destination, which takes a sequence of vectors. The other is similar to the vectorize()
method, which takes the input vectors, applies some computation, and returns the result vectors. E.g., the linalg op uses its body to perform the computation.
However, what's been confusing me is whether we should enforce vectorizable ops to implement DPS interface, as we may need to pass input vector size for writing vectors. Today, they are well-defined for linalg ops because the inits are passed and we capture the iteration domain sizes in the indexing maps. I'm happy to chat with you offline or we can prepare RFC together, and hopefully (4) won't become a blocker for the change.
Thank you both for the additional context — that’s exactly what I was missing. Special thanks to @hanhanW for the detailed and generous overview of related discussions; I found it extremely helpful 🙏🏻 To quickly clarify my earlier point about “dead code”: LogicalResult mlir::linalg::vectorize(...) {
SmallVector<Value> results;
LogicalResult vectorizeResult = mlir::linalg::vectorize(results, ...);
} back into something like: LogicalResult mlir::linalg::vectorize(...) {
SmallVector<Value> results;
// inline mlir::linalg::vectorize(results, ...);
} In other words, the new overload of
Thanks for pointing this out - my bad for not paying closer attention to similar patterns elsewhere. It does seem like this change is trying to bring the vectorize API more in line with the structure used in other transformations, e.g.:
That said, as @hanhanW mentioned, That seems inconsistent, and I’d love to understand the rationale. Clarifying that would help confirm that this change is indeed a step toward a more uniform transformation interface - which I agree is a worthwhile goal.
Wouldn’t
That sounds great — I’d be happy to sync offline as well. All in all, I’m happy for you to move forward with this change. That said, my one request would be to ensure that only a single Based on the discussion, the following signature seems sufficient to serve both upstream and downstream needs: FailureOr<VectorizationResult>
vectorize(RewriterBase &rewriter,
Operation *op,
ArrayRef<int64_t> inputVectorSizes = {},
ArrayRef<bool> inputScalableVecDims = {},
bool vectorizeNDExtract = false,
bool flatten1DDepthwiseConv = false); This would also:
How does this sound? |
That SGTM! I was thinking the same thing about Big thank you to Hanhan, for adding great additional context! |
Nice discussion! I haven't had the time to dive deep into the details but I'm generally +1 to making the vectorizer more modular and reusable. We talked about introducing interfaces to decouple the information of the iteration space to be vectorized from the actual ops, which is something we could explore. My only concern would be if we head towards implementing an ad-hoc "conversion rewriter" where IR modifications have to live in data structures on the side, or we set the expectations that the vectorizer can't materialize changes directly in the IR. That would be difficult to justify and maintain. Happy to provide feedback on a specific example, if you have one you can share. |
Honestly, I don't know which is better and what the rationale is. Maybe different contributor has different preference. If I had to choose, I'd not perform the replacement. Because it provides more control to other users. I'm not sure if the replacement would become an issue or not when people build complicated patterns and dialect conversion or type converter is involved.
Yes, it can be if (failed(vectorize(...)) {
return failure();
} But yeah,
Thanks, it sounds good to me! |
Signed-off-by: Max Dawkins <[email protected]>
6d579fb
to
b8a783b
Compare
I have updated the implementation to return a |
rewriter.eraseOp(op); | ||
|
||
return success(); | ||
return VectorizationResult({results}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're using braced initialization lists, I think this can be VectorizationResult{results}
or just {results}
.
/// also allows the vectorization of operations with dynamic shapes. Returns | ||
/// a VectorizationResult containing the results of the vectorized op, or | ||
/// failure if the transformation fails. | ||
FailureOr<VectorizationResult> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move Returns ...
to the first sentence, which is a high level overview of the function. And maybe remove Emit a suitable vector form for an operation.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates - this is roughly what I expected 🙂
I’ve posted a few general discussion points, but they’re non-blocking. The main issue I'd like to flag is naming (VectorizationHookResult
vs. VectorizationResult
) - but naming is hard 🙂
By the way, @Max191 - I noticed you force-pushed and rewrote the history. Please try to avoid that, and keep in mind:
From experience, many of us do rebase and force-push, but without squashing (i.e., we preserve individual commits). I'm personally fine with that, as long as the original commits are retained - otherwise it becomes harder to follow the review context, especially for post-commit review.
Not a big deal, just a kind request for future PRs 🙂
And as for rebasing strategies in general - that's a topic better suited for Discourse.
/// Transformation information returned after vectorizing. | ||
struct VectorizationResult { | ||
/// Results of the vectorization transform to replace the original operation. | ||
SmallVector<Value> replacements; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Not quite "transformation information" ;-)
- Do we need a vector of results? I don't want to sound "limiting" or call for a bigger refactor in this PR, but from what I recall, the vectorizer always gets an Op with one result and returns a vectorized result. Is this true? I just suspect that a single
Value
would be sufficient. This could be something for a follow-up PR. VectorizationResult
->VectorizedResult
? This name is a bit too close toVectorizationHookResult
and the distinction is not well documented. My suggestion is not that much better, but it "hints" it's just the actual vectorized Op (as opposed to "vectorized Op + status flag"). I will try to come back with sth better :)
Updates the linalg::vectorize function to return a
FailureOr<VectorizationResult>
containing the values to replace the original operation, instead of directly replacing the original operation. This aligns better with the style of transforms used with the TilingInterface, and gives more control to users over the lowering, since it allows for additional transformation of the IR before replacement.There was already a
VectorizationResult
defined, which was used for the internal vectorize implementation usingCustomVectorizationHook
s, so the old struct is renamed toVectorizationHookResult
.Note for integration: The replacement of the original operation is now the responsibility of the caller, so wherever
linalg::vectorize
is used, the caller must also dorewriter.replaceOp(vectorizeResults->replacements)
.